93 research outputs found

    Throughput constrained parallelism reduction in cyclo-static dataflow applications

    Get PDF
    International audienceThis paper deals with semantics-preserving parallelism reduction methods for cyclo-static dataflow applications. Parallelism reduction is the process of equivalent actors fusioning. The principal objectives of parallelism reduction are to decrease the memory footprint of an application and to increase its execution performance. We focus on parallelism reduction methodologies constrained by application throughput. A generic parallelism reduction methodology is introduced. Experimental results are provided for asserting the performance of the proposed method

    A linear programming approach to general dataflow process network verification and dimensioning

    Full text link
    In this paper, we present linear programming-based sufficient conditions, some of them polynomial-time, to establish the liveness and memory boundedness of general dataflow process networks. Furthermore, this approach can be used to obtain safe upper bounds on the size of the channel buffers of such a network.Comment: In Proceedings ICE 2010, arXiv:1010.530

    Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures

    Get PDF
    International audienceThe dataflow programming model has shown to be a relevant approach to efficiently run mas-sively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major impor-tance when it comes to performance, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the devel-oper in several ways that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor

    A probabilistic design for practical homomorphic majority voting with intrinsic differential privacy

    Get PDF
    As machine learning (ML) has become pervasive throughout various fields (industry, healthcare, social networks), privacy concerns regarding the data used for its training have gained a critical importance. In settings where several parties wish to collaboratively train a common model without jeopardizing their sensitive data, the need for a private training protocol is particularly stringent and implies to protect the data against both the model's end-users and the other actors of the training phase. In this context of secure collaborative learning, Differential Privacy (DP) and Fully Homomorphic Encryption (FHE) are two complementary countermeasures of growing interest to thwart privacy attacks in ML systems. Central to many collaborative training protocols, in the line of PATE, is majority voting aggregation. Thus, in this paper, we design SHIELD, a probabilistic approximate majority voting operator which is faster when homomorphically executed than existing approaches based on exact argmax computation over an histogram of votes. As an additional benefit, the inaccuracy of SHIELD is used as a feature to provably enable DP guarantees. Although SHIELD may have other applications, we focus here on one setting and seamlessly integrate it in the SPEED collaborative training framework from \cite{grivet2021speed} to improve its computational efficiency. After thoroughly describing the FHE implementation of our algorithm and its DP analysis, we present experimental results. To the best of our knowledge, it is the first work in which relaxing the accuracy of an algorithm is constructively usable as a degree of freedom to achieve better FHE performances

    Stochastic graph partitioning: quadratic versus SOCP formulations

    Get PDF
    International audienceWe consider a variant of the graph partitioning problem involving knapsack constraints with Gaussian random coefficients. In this new variant, under this assumption of probability distribution, the problem can be traditionally formulated as a binary SOCP for which the continuous relaxation is convex. In this paper, we reformulate the problem as a binary quadratic constrained program for which the continuous relaxation is not necessarily convex. We propose several linearization techniques for latter: the classical linearization proposed by Fortet (Trabajos de Estadistica 11(2):111–118, 1960) and the linearization proposed by Sherali and Smith (Optim Lett 1(1):33–47, 2007). In addition to the basic implementation of the latter, we propose an improvement which includes, in the computation, constraints coming from the SOCP formulation. Numerical results show that an improvement of Sherali–Smith’s linearization outperforms largely the binary SOCP program and the classical linearization when investigated in a branch-and-bound approach

    Practical Multi-Key Homomorphic Encryption for More Flexible and Efficient Secure Federated Aggregation (preliminary work)

    Get PDF
    In this work, we introduce a lightweight communication-efficient multi-key approach suitable for the Federated Averaging rule. By combining secret-key RLWE-based HE, additive secret sharing and PRFs, we reduce approximately by a half the communication cost per party when compared to the usual public-key instantiations, while keeping practical homomorphic aggregation performances. Additionally, for LWE-based instantiations, our approach reduces the communication cost per party from quadratic to linear in terms of the lattice dimension

    DFA on LS-Designs with a Practical Implementation on SCREAM (extended version)

    Get PDF
    LS-Designs are a family of SPN-based block ciphers whose linear layer is based on the so-called interleaved construction. They will be dedicated to low-end devices with high performance and low-resource constraints, objects which need to be resistant to physical attacks. In this paper we describe a complete Differential Fault Analysis against LS-Designs and also on other families of SPN-based block ciphers. First we explain how fault attacks can be used against their implementations depending on fault models. Then, we validate the DFA in a practical example on a hardware implementation of SCREAM running on an FPGA. The faults have been injected using electromagnetic pulses during the execution of SCREAM and the faulty ciphertexts have been used to recover the key’s bits. Finally, we discuss some countermeasures that could be used to thwart such attacks

    Towards Better Availability and Accountability for IoT Updates by means of a Blockchain

    Get PDF
    International audienceBuilding the Internet of Things requires deploying a huge number of devices with full or limited connectivity to the Internet. Given that these devices are exposed to attackers and generally not secured-by-design, it is essential to be able to update them, to patch their vulnerabilities and to prevent hackers from enrolling them into botnets. Ideally, the update infrastructure should implement the CIA triad properties, i.e., confidentiality, integrity and availability. In this work, we investigate how the use of a blockchain infrastructure can meet these requirements, with a focus on availability

    Stream ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

    Get PDF
    International audienceIn typical applications of homomorphic encryption, the first step consists for Alice to encrypt some plaintext m under Bob’s public key pk and to send the ciphertext c = HEpk(m) to some third-party evaluator Charlie. This paper specifically considers that first step, i.e. the problem of transmitting c as efficiently as possible from Alice to Charlie. As previously noted, a form of compression is achieved using hybrid encryption. Given a symmetric encryption scheme E, Alice picks a random key k and sends a much smaller ciphertext c′ = (HEpk(k), Ek(m)) that Charlie decompresses homomorphically into the original c using a decryption circuit CE−1 .In this paper, we revisit that paradigm in light of its concrete implemen- tation constraints; in particular E is chosen to be an additive IV-based stream cipher. We investigate the performances offered in this context by Trivium, which belongs to the eSTREAM portfolio, and we also pro- pose a variant with 128-bit security: Kreyvium. We show that Trivium, whose security has been firmly established for over a decade, and the new variant Kreyvium have an excellent performance

    At Last! A Homomorphic AES Evaluation in Less than 30 Seconds by Means of TFHE

    Get PDF
    Since the pioneering work of Gentry, Halevi, and Smart in 2012, the state of the art on transciphering has moved away from work on AES to focus on new symmetric algorithms that are better suited for a homomorphic execution. Yet, with recent advances in homomorphic cryptosystems, the question arises as to where we stand today. Especially since AES execution is the application that may be chosen by NIST in the FHE part of its future call for threshold encryption. In this paper, we propose an AES implementation using TFHE programmable bootstrapping which runs in less than a minute on an average laptop. We detail the transformations carried out on the original AES code to lead to a more efficient homomorphic evaluation and we also give several execution times on different machines, depending on the type of execution (sequential or parallelized). These times vary from 4.5 minutes (resp. 54 secs) for sequential (resp. parallel) execution on a standard laptop down to 28 seconds for a parallelized execution over 16 threads on a multi-core workstation
    corecore